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ABSTRACT 



A query performance prediction ("QPP") module is pro- 
vided as part of an applications layer residing on respective 
user stations operating in conjunction with a centralized host 
computer system. The QPP module correlates estimated 
system cost information for a database query provided from 
a database management system associated with the com- 
puter system with statistics compiled from previous queries 
in order to estimate the system response time. In particular, 
an estimated CPU time for executing the present query is 
derived by extrapolating actual CPU times recorded for past 
queries having the closest estimated costs for accessing the 
same, or similar, tables and items in the database, using a 
form of a "nearest neighbor" algorithm to match the present 
query to either identical or statistically closest past queries^ 
The estimated CPU time for the present query is then 
multiplied by a current ratio of total elapsed times-to-CPU 
times for the system to produce an estimated total elapsed 
time for responding to the query. 



20 Claims, 4 Drawing Sheets 
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SYSTEMS AND METHODS FOR In a preferred embodiment, a query performance predic- 

ESTIMATING QUERY RESPONSE TIMES IN tion ("QPP") module is incorporated as part of an applica- 

A COMPUTER SYSTEM tion residing on respective user stations connected to the 

FIELD OF THE INVENTION computer system. The QPP module correlates estimated 
r-rm 4 . . „ .5 system cost information provided for each new query from 

The present invention pertains generally to computer V „ , t^™*o -.l . -i j r 

systems and, more particularly, to methods and apparatus for ^ com P uter ° BMS wth s,atl f cs from 
estimating response times for database queries in [computer P^ious queries m order to estimate the system response 

systems. "me to the present query. 

„ . „.,„„„. ,„ m „„ m „ ™ „ . , In particular, for each new query, a cost optimizer located 

BACKGROUND OF THE INVENTION 10 the system 4 D > MS dete ,J rmines a mosl 

In a computer system operating in conjunction with a efficient execution plan for accessing the requested data, 
database (e.g., a relational database), data is typically stored along with an estimate of the corresponding (relative) sys- 
in the form of tables, with each columo of a given table tcm "cost" f or executing the query. This estimated cost is 
representing a particular data attribute and each row repre- returned to the QPP module at the respective user station, 
senting a specific record. A user of the computer system 15 which employs a "nearest neighbor" algorithm to determine 
accesses data from one or more tables by submitting a query an estimat e d central processing unit ("CPU") time required 
to a database management system ("DBMS") associated for exe cuting the present query by extrapolating CPU times 
with the computer system The DBMS responds to the query recorded for past queries havjng ^ same or similar es(i . 
by constructing a particular view of the data stored in the mated costs for accessing the same, or similar, tables and 
database and returning a responsive data set to the user. 20 i|ems in the database . ^ estima t ed CPU time for the 
It is common for users of such computer systems to present query is then multiplied by a current ratio of total 
submit ad hoc database queries, i.e., self-structured requests elapsed times to CPU times to produce an estimated total 
of any type and/or bounds, with no up-front knowledge or elapsed time for the system lo respond to the present query, 
information as lo how long the computer system will actu- , n accordance with one of the invention, each user 
ally take to respond. In particular, databases can be very 25 station records rtinent mformation for each new ^ 
large and, depending upon various factors, such as the query> which is collec , ed and maintained b the host com . 
current operational load on the computer system and the ter ^ in a hist ^ hist preferably 
complexity of a particular query, the elapsed time required - mc [^ s iof each query an identification of the particular 
to execute the query can vary widely-e.g., from less than taWe(s) and columns that were accessed| whether the query 
a second to several hours, or even days. 30 was satisfied from , pre . computed the estimated 
This uncertain system response time may deter users from cost for the querV; t h e actual CPU time used in running the 
submitting certain queries, since they will not know whether query and t he total elapsed time required to complete the 
they will get an immediate answer, or will have to wait for query and respond to the ^ recorded que ry history 
an extended period of time. The situation is especially information is loaded by the computer system into the QPP 
problematic for computer system administrators who control 35 modu i es of each user stat j on; e .g., whenever a user "logs on" 
the amount of system resource time that is expended by t o the computer system, or after specified time intervals, 
users submitting ad-hoc query requests. For example, it is A readi , , advanl ined b provid i ng an 
not uncommon for multiple users to submit overlapping estimated nse time t0 a ^ rior t0 their maW 
queries to a centralized computer system from separate a decision Qn whethef tQ ed ^ a fc ^ ^ 
respective user slatons, e.g personal computers or network 40 rfes ^ ^ ^ ^ leted within an aUowable or 
terminals, potentially degrading overall system performance acceptabk time &ame are not even sM> regardless of 
an e ciency. whether the decision to proceed is based on a preset thresh- 
As a result, system administrators have been known to old limit> or ^ discretion . For example, if a query is 
establish processing resource time quotas, whereby users are est i ma ted to take an unacceptable length of time for execu- 
allowed only a certain period of lime in which to obtain a 45 tiQn by ^ systen ^ a ^ may simply cancd the query and 
response to any given system query. Under these move on to anolher task> such as submitting an alternate 
circumstances, the time period begins to run when a user query ThuSj an advance estimatiori 0 f system response times 
submits a query. If a response is not returned by the system t0 data5ase queries can result in improved system efficiency, 
before a preset time limit is reached, the DBMS abandons system management and user satisfaction. 

any further execution of the query, leaving the user with no 50 . . 

r # At _ . Other and further objects, features, aspects, and advan- 

result whatsoever, even if a response may have otherwise . , . .* *nu u j . j 

, , , ■ . * i . ^ i - i • tages of the present invention will become better understood 

been produced in just an instant later. Although this f . r u • j t A * • c *u c a 

, L . , v iL i , i, with the following detailed description of the preferred 

(somewhat crude) methodology may preserve overall sys- , *. . f 4 , . r . , . 

; a~ ■ r L embodiments illustrated in the accompanying drawings, 

tem resources, it may decrease efficiency as a result of the ° ° 

wasted system resource time expended for processing que- 55 BRIEF DESCRIPTION OF THE 

ries that are abandoned at the time limit. ACCOMPANYING DRAWINGS 

Thus, it would be desirable to provide an up-front esti- ^ drawings iUustrate bolh lhe design and utm of 

mation of the system response time required for returning a preferre d embodiments of the present invention, in which: 

response to individual database queries prior to their actual 1A . , . ,. - . 

^ t , • * j j c *n FIG. 1A is a simplified block diagram of an exemplary 

execution, so that system resource time is not expended for ou . r . to . ■ , , , 

4l _ . . 4 . u . i » ... computer system operating in coni unction with a database; 

those queries that will not be completed withm an allowable r J r to J ' 

or acceptable time period. FIG - 1B * a simplified block diagram of an exemplary 

user station and an exemplary host computer of FIG. 1A; 

SUMMARY OF THE INVENTION FIG. 2 is a high-level flow chart depicting a preferred 

The present invention provides methods and apparatus for 65 process for obtaining database query response time esti- 

providing an estimate of the elapsed time required for a mates in the computer system of FIG. 1A, including use of 

computer system to respond to database queries. a query performance prediction ("QPP") module; 



05/15/2003, EAST version: 1.03.0002 



6,026 : 

3 

FIG. 3 is a simplified block diagram illustrating a statis- 
tical database of query records compiled by a database 
management system in the computer system of FIG. 1A; and 

FIG. 4 is a flow chart depicting a preferred process for 
estimating query response times by the QPP module of FIG. 5 
2. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

In the following description, for purposes of explanation, 10 
numerous specific details are set forth in order to provide a 
thorough understanding of the present invention. It will be 
apparent, however, to one skilled in the art, that the present 
invention may be practiced without these specific details. In 
other instances, well-known structures and devices are 15 
shown in block diagram form in order to avoid unnecessarily 
obscuring the present invention. 

A computer system generally may take many forms, from 
a configuration including a variety of processing units, as 
further described herein, networked together to function as 20 
a integral entity, to a single computer, e.g., a personal 
computer, operational in a stand-alone environment. The 
present invention can be embodied in any of these computer 
system configurations. 

Referring to FIG. 1A, in a presently preferred 25 
embodiment, a computer system 20 includes a host com- 
puter 22 connected to a plurality of individual user stations 
24. In a presently preferred embodiment, the user stations 24 
each comprise suitable data terminals, such as, e.g., personal 3Q 
computers, portable laptop computers, or personal data 
assistants ("PDAs"), which can store and independently run 
one or more applications (i.e., programs). For purposes of 
illustration, some of the user stations 24 are connected to the 
host computer 22 via a local area network ("LAN") 26. ^ 
Other user stations 24 are remotely connected to the host 
computer 22 via a public telephone switched network 
("PSTN") and/or a wireless network 30. 

In a presently preferred embodiment, the host computer 
22 operates in conjunction with a data storage system 31, 4Q 
wherein the storage system 31 contains a database 32 that is 
readily accessible by the host computer 22. In a presently 
preferred embodiment, the database 32 is a relational data- 
base. 

In alternative embodiments, the database 32 may be 45 
resident on the host computer, stored, e.g., in the host 
computer's ROM, PROM, EPROM, or any other memory 
chip, and/or its hard disk. In yet alternative embodiments, 
the database 32 may be read by the host computer 22 from 
one or more floppy disks, flexible disks, magnetic tapes, any 50 
other magnetic medium, CD-ROMs, any other optical 
medium, punchcards, papertape, or any other physical 
medium with patterns of holes, or any other medium from 
which a computer can read. 

The host computer 22 includes a database management 55 
system ("DBMS") 34, which is one or more programs and/or 
hardware circuitry, configured to access data stored in the 
database 32. In a presently preferred embodiment, each of 
the user stations 24 includes its own database applications 
layer 36, which is one or more programs and/or hardware go 
circuitry configured to interact with the DBMS 34. 

The respective database applications layer 36 of a user 
station 24 allows a user of the computer system 20 to access 
data from the database 32 by forming and submitting queries 
on the user station 24. In particular, in a presently preferred 65 
embodiment, a user-submitted query is converted into an 
SOL statement by the database applications layer 36 resident 
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in the respective user station 24. SQL is a standard language 
used in relational database management systems. An SQL 
query is the respective user query formatted in SQL. The 
SQL query is issued to the DBMS 34, which executes the 
SQL query and returns a responsive data result set to the 
user. 

Although SQL is a presently preferred language, other 
languages may alternatively be used for executable queries. 
In other alternative embodiments, a user query may be 
submitted to a respective database applications layer 36 in 
an appropriate language format for execution by the DBMS 
34. 

In alternative embodiments, an equivalent of the database 
applications layer 36 may reside on a server system (e.g., a 
SUN© SPARCstation™) with one or more user stations 24 
(i.e., "thin clients") either locally, or remotely, connected to 
the server system. In this alternative configuration, the thin 
clients support an appropriate user interface, which is gen- 
erally one or more programs that assist a user to access the 
database 32. In other alternative embodiments, the user 
stations 24 are "dumb" terminals. In this alternative 
configuration, an equivalent of the database applications 
layer 36 may reside on the host computer 22, along with the 
DBMS 34, or, alternatively, can reside on a separate, second 
computer (not shown) which interfaces with the host com- 
puter 22. 

Referring to FIG. IB, in a presently preferred 
embodiment, each user station 24 and the host computer 22 
(each referred to generally as a processing unit) embodies a 
general architecture 5. A processing unit includes a bus 6 or 
other communication mechanism for communicating 
instructions, messages and data (collectively, information), 
and one or more processors 7 coupled with the bus 6 for 
processing information. A processing unit also includes a 
main memory 8, such as a random access memory (RAM) 
or other dynamic storage device, coupled to the bus 6 for 
storing dynamic data and instructions to be executed by the 
processors) 7. The main memory 8 also may be used for 
storing temporary data (i.e., variables) or other intermediate 
information during execution of instructions by the 
processors) 7. 

A processing unit may further include a read only memory 
(ROM) 9 or other static storage device coupled to the bus 6 
for storing static data and instructions for the processors) 7. 
A storage device 10, such as a magnetic disk or optical disk, 
may also be provided and coupled to the bus 6 for storing 
data and instructions for the processor(s) 7. 

A processing unit may be coupled via the bus 6 to a 
display 11, such as a cathode ray tube (CRT), for displaying 
information to a user. An input device 12, including alpha- 
numeric and other keys, is coupled to the bus 6 for com- 
municating information and command selections to the 
processors) 7. Another type of user input device may 
include a cursor control 13, such as a mouse, a trackball, a 
fingerpad, or cursor direction keys for communicating direc- 
tion information and command selections to the processors) 
7 and for controlling cursor movement on the display 11. 

According to one embodiment of the invention, the indi- 
vidual processing units perform specific operations by their 
respective processors) 7 executing one or more sequences 
of one or more instructions contained in the main memory 
8. Such instructions may be read into the main memory 8 
from another computer-readable medium, such as the ROM 
9 or the storage device 10. Execution of the sequences of 
instructions contained in the main memory 8 causes the 
processors) 7 to perform the process steps described herein. 
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In alternative embodiments, hard-wired circuitry may be A processing unit may transmit and receive messages, 
used in place of or in combination with software instructions data, and instructions, including program, i.e., application, 
to implement the invention. Thus, embodiments of the code, through its respective communication link 15 and 
invention are not limited to any specific combination of communication interface 14. Received program code may 
hardware circuitry and/or software. 5 be executed by the respective processor(s) 7 as it is received, 

The term "computer-readable medium", as used herein, and/or stored in the storage device 10, or other associated 
refers to any medium that provides information to the non-volatile media, for later execution. In this manner, a 
processors) 7. Such a medium may take many forms, processing unit may receive messages, data and/or program 
including, but not limited to, non-volatile, volatile and code in tne of a carrier wave. ^ 
transmission media. Non-volatile media, i.e., media that can ]Q Referring to JIG. 2, in accordance with a general aspeci, 
retain information in the absence of power, includes the of the invention^ a'ntd hoc query 40 is converted into a SQfc^ 
ROM 9. Volatile media, i.e., media that can not retain statement by the database applications layer 36 residing in 
information in the absence of power, includes the main the respective user station from which the query 40 is input 
memory 8. Transmission media includes coaxial cables, by a user. The SQL statement is transmitted to a cost 
copper wire and fiber optics, including the wires that com- 15 optimizer 42 module associated with the DBMS 34 at the 
prise the bus 6. Transmission media can also take the form host computer 22. The cost optimizer 42 considers available 
of carrier waves; i.e., electromagnetic waves that can be access paths to the requested data from the database 32, and 
modulated, as in frequency, amplitude or phase, to transmit derives a plan for executing the query that is most efficient 
information signals. Additionally, transmission media can based on statistics maintained in a data dictionary associated 
take the form of acoustic or light waves, such as those 20 with the respective table(s), along with their associated 
generated during radio wave and infrared data communica- clusters and indexes, accessed by the SQL statement repre- 
tions. f^k^^rM^vJ sentm S me q uerv 40. 

Common forms of computer-readable media incluue, for /^L. In particular, the cost optimizer 42 generates a set of 
example: a floppy disk, flexible disk, hard disk, magnetic/" potential execution plans (not shown) for executing the 
tape, any other magnetic medium, CD-ROM, any other Irrespective query SQL statement (40) based on the available 
optical medium, punchcards, papertape, any other physical ^a ccess paths, and ^stimates the relative "cost" of each 
medium with patterns of holes, RAM, ROM, PROM (i.e., "potential" execution plan based on the^data distribution^and 
programmable read only memory), EPROM (i.e., erasable storag e characteristics f or the respective tables^clusters and 
programmable read only memory), including FLASH- indexes to~b e use d v Execution plans with greater relative 
EPROM, any other memory chip or cartridge, carrier waves, 3 ^-'cost5 generally" take more time to execute than those with 
or any other medium from which a processor 7 can retrieve smaller relative costs, although the relationship between cost 
information. and execution time is by no means linear, and depends on 

Various forms of computer-readable media may be manv other factors. An exemplary preferred cost optimizer 
involved in providing one or more sequences of one or more 35 ^ provided in the Oracle7.3™ version of the Oracle® 
instructions to the processors) 7 for execution. For example, database product produced and distributed by Oracle 
the instructions may initially be provided on a magnetic disk Corporation, Redwood City, Calif., USA. 
of a remote computer (not shown). The remote computer The cost optimizer 42 compares the estimated costs of the 
may load the instructions into its dynamic memory and then potential execution plans and returns the smallest estimated 
transit them over a telephone line, using a modem. A modem 4Q cost 44, along with a result set 45 representing the selected 
local to the processing unit may receive the instructions on execution plan for the input query 40, to a query perfor- 
a telephone line and use an infrared transmitter to convert mance prediction ("QPP") module 46 residing within the 
the instruction signals transmitted over the telephone line to database applications layer 36 of the respective user station 
corresponding infrared signals. An infrared detector (not _£4. The QPP module 46 compares the estimated cost 44 and 
shown) coupled to the bus 6 may receive the infrared signals 45 result set 45 for the present query 40 to the recorded 
and place the instructions therein on the bus 6. The bus 6 estimated costs and result s gjs of past queries, searching out 
may carry the instructions to the main memory 8, from those past queries having the same or similar estimated costs 
which the processor(s) 7 thereafter retrieves and executes for accessing the same, or similar, tables and items in the 
the instructions. The instructions received by the main database located in a query statistics cache 48 associated 
memory 8 may optionally be stored on the storage device 10, 5Q with the database applications layer 36. 
either before or after their execution by the processors) 7. — More particularly, referring additionally to FIG, 3, each 

Each processing unit may also include a communication user station 24 records pertinent information for each new 
interface 14 coupled to the bus 6. The communication user query, which are collected and maintained by the host 
interface 14 provides two-way communication between the computer 22 in a query history 50. In a presently preferred 
respective user stations 24 and the host computer 22. The 55 embodiment, the query history 50 includes: the query ID 58, 
communication interface 14 of a respective processing unit the date 60 and time 62 at which the query was executed; the 
transmits and receives electrical, electromagnetic or optical user ID 64 of the user station 24 submitting the query; an 
signals that include data streams representing various types identification of the result set 66 for the query — i.e., the 
of information, including instructions, messages and data. particular table(s) and column(s) that were accessed; 

A communication link 15 links a respective user station 60 whether the query was satisfied from a pre-computed sum- 
24 and a host computer 22. The communication link 15 may mary table 68; the estimated cost 70 for the query; the actual 
be a LAN 26, in which case the communication interface 14 cpu time 72 used in running the query; and the estimated 
may be a LAN card. Alternatively, the communication link time 73 and actual total elapsed time 74 required for the 
15 may be a PSTN 28, in which case the communication system 20 to respond to the query. 

interface 14 may be an integrated services digital network 65 The recorded query history 50 is preferably loaded into 
(ISDN) card or a modem. Also, as a further alternative, the the query statistics cache 48 of the respective QPP modules 
communication link 15 may be a wireless network 30. of the user stations 24 whenever a user "logs on" to the 
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computer system, or after a specified time interval since a of relatively heavy and relatively light system usage, 
last update. The query history 50 is preferably kept current, Likewise, a multiplying factor for individual users or groups 
so that ongoing changes in the computer system 20 or of users may be used in the response time estimation process 
database 32 do not significantly skew the statistical corre- in order to take into account differences caused by trans- 
lation of past query results with future query response time 5 mission time variances due to some users being located 
estimations. more remotely from the computer system, or being con- 
Referring additionally to FIG. 4, the QPP module 46 nected by a slower transmission medium, than other users, 
compares 80 the cost, estimate 44 and result set 45 for the Thus, while preferred embodiments of methods and appa- 
present query 40 to the recorded estimated costs and result ra tus for estimating the time required for a computer system 
sets of past queries stored in the query statistics cache 48. If ™ to respond to database queries have been shown and 
an exact match is found between the present query 40 and a described, it will be apparent to one of ordinary skill in the 
recorded past query 58 stored in the cache 48 (i.e., wherein art that numerous alterations may be made without departing 
both queries have the same estimated costs for the same f r0 m the spirit or scope of the invention. 

result sets), the QPP module 46 selects the recorded actual „,„„ nf av «„„t a ,i,u™,„u „ hdd p~, ♦ 

nnn , * t , t . . , eo t . 1C By way or example, although a QPP module for estimat- 

CPU time 72 of the matching past query 58 as an estimated 15 . r ' ° 

nnTT4 . 0 ~ c tU f ah ing query response times has been disclosed and described 

CPU time 82 for the present query 40. & . . 3 A f , . „. . „. t „ . r 

r— r h j as being deployed m intelligent user stations (i.e., with 

If no exact match is found, a nearest neighbor algorithm separa te data storage and computing abilities), it will be 

83 is employed to extrapolate an estimated CPU time 84 readily apparem to those skilled in the art that the invention 

based on a weighted average of CPU timesforjhe^cjosesi may be equally adapted fof use with alterDat e system 

i mat ching stored que ries — i.e., from the "nearest neighbors" 2U architectures 

of the present query 40 based on the respective recorded _ t 

L estimated costs and result sets of the past queries. Notably, ^or example the QPP module could be adapted to reside 

r a past query having an identical result set was executed a u nd ™* m the , host computer, along with (or as part oQ 

using a summary table, the estimated response time can be * e DBMS. An advantage of this alternate configuration is 

approximated to a selected (e.g., minimum) value. A pre- * < hat . use , r f e ? es b f in P ut from network terminals 

ferred methodology for the use of summary tables is pro- having little (or no) independent storage or Processing 

vided in U.S. patent application Ser. No. 08/962,533, capabilities .In a still further alternate embodiment, the QPP 

entitled "Summary Table Query Routing," filed on the same mo ^ ule could be ada P ted to reside and °P erate ™ a " inter " 

day as the present application and fully incorporated herein mediar y S atewa y Krvcr s y stem that connects multl P le user 

by reference. A preferred methodology for creating sum- 30 stations to one or more separate host computer systems, 

mary tables is provided in U.S. patent application Ser. No. Wlth * 15 later configuration, query response time estimates 

08/962,029, entitled "Summary Table Management In A ^an advantageously be made for queries involving multiple 

Computer System," which was also filed on the same day as dlfferent Cabases and/or combinations thereof, 

the present application and which is also fully incorporated In accordance with volume 37, section 1.96 of the Code 

herein by reference. 35 of Federal Regulations, a preferred computer program for 

Apresently preferred embodiment of the nearest neighbor estimating the time required for a computer system to 

algorithm employed by the QPP module 46 is disclosed in respond to a database query as deployed in a presently 

the form of a computer program listing attached below. preferred applications program in accordance with the teach- 

Tne estimated CPU time is multiplied 85 by a current ratio 4Q in S s of the P resenl invention is as follows: 
52 of total elapsed time-to -CPU time for the computer 

system 20 to produce an estimated total elapsed query ^ ________ 

response time 54. The ratio 52 of total elapsed time-to-CPU DCENumber 

time is preferably calculated by the DBMS 34 based on all DCEQuerypredictionStats:: 

user database queries made in the system 20, regardless of EstimateElapsedTime (const DCENumber& cost, 

the particular tables and items accessed, and is updated to , ni nTArcwr , OT v ® CE f™ g t obj " ^n!^ k n 

r " { OLDIAGSTRT (OLX"GetInternalDLagStream()> kDCEDebug, 1) 

V include only the most recent queries in order to reflect "DCEQueryPredictionStats::EstimateElapsedTime()" « endl 

\^ Current System Operating Conditions. « " Estimating time for cost: " « cost « endl 

In accordance with a further general aspect of the „ „ <<: 0LDIAGEND ; 

, A . 1.111 // If user preferences have turned off stats then return zero 

invention, the estimated total elapsed query response time 50 if ( mPerformOpp == false ) 

54 is then returned to the user so that a decision (whether { OLDIAGSTRT (OLX-GetlnternalDiagStreamO, kDCEDebug, l) 

automatic or manual) can be made as to whether to continue « "DCEQueryPredictionStats::EstimateElapsedTLmeO" «endl 

With the query 40. As noted above, the estimated total <<" User prefs turned off QPP - returning estimated time 

7 mA . i . . 1 ■ -n /• °f° « endl 

response time 54 is also stored in the query history 50 (i.e., K< oldiagend; 

item 73), in order to later be compared to the actual elapsed 55 return 0; } 

response time (item 74) and thereby provide feedback // If cost is »«> or nuI1 then retu m zero 

information regarding the accuracy of the response time f ^SS^fflS:^L>^u-»0, kDCEDebug, :) 

estimation process. K< "DCEQueryPredictkmStats: Estimate Elapse dTimeO" « endl 

Further considerations may be included in the response « " QPP cost is zero or null - returning estimated time 

time estimation process, if desired, include statistical evi- 60 << qldlagend 

dence of estimated times versus actual response times, return o- } 

Whereby each response time estimate can be Statistically // If there are no stats then return an estimated time of 1 

qualified by a calculated degree of certainty (i.e., by a "plus second. 

or minus x%") based on a comparison of past results. l f iT^ at i^ s Jf!? V ^ , . „ _ _ , 

7 r . . I. . . , j { OLDIAGSTRT (OLX "Get Internal DiagStream(), kDCEDebug, 1) 

In alternate preferred embodiments, the time and date 65 « "DCEQueryPredictionStats-EstimateElapsedTuneO" «cndl 

Statistics Of the past queries may also be used in Order tO « " No stats - returning estimated time of 1" « endl 

factor in differences in elapsed system time between periods 
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« OLDIAGEND; 

return DCENumber(l); } 
// If an object use key has been specified then we may want to 
// look for all stats for that key value. A load is not 
// attempted if the user prefs have turned off this feature or if 
}} the key has already been loaded 
if ( JobjectUseKey.IsEmptyO && 

m LoadStatsByObjectUseKey & & 

mLoadedObjectUseKeys.find(objectUseKey) — 

mLoadedObjectUselCeys.endO ) 

{ Loadstatistics(objectUseKey); 

// record that we've loaded this object use key so we don't 
// attempt to load it again 
bool noClash - 

(mLoadedObjectUseKeys.insert(objectUscKey)). second; 

OL_ASSERT( noClash — true ); } 
// Decide whether to use the whole set of stats or a subset based 
//upon the object use key. 

DCEQueryStatSet* pCurrentStatSet; 

DCEOueryStatSet statsSubset; 

if ( mpStatSet->ContainsObjectUseKey(objectUseKey) ) 
{ // we have some stats specific to the set of objects we are 
// trying to query - let's extract those as a subset and use 
// them 

mpStatSet->ExtractStatsByObjs(objectUseKey f statsSubset); 
pCurrentStatSet = &statsSubset; } 

else 

{ // no stats exist for this set of objects - use the whole set 

// of existing stats for estimating the query time 

pCurrentStatSet = mpStatSet; } 
// Get references to the cost and date indexes we need to use 

const DCEStatisticsPByCost& statsByCost - 

pCurrcntStatSet->GetStatsByCost 0; 

const DCEStatisticsPByDate& statsByDate = 

pCu rre ntStatSet- >GetStatsByDate(); 
// If we are using cpu time then find out the cpu/e lapsed time 
// ratio, otherwise assume a ratio of 1 

DCENumber weigh tedTimeRatio - 1; 

if ( mUseCpuTime ) 
{ // Estimate the cpu/elapsed time ratio by performing a 

// weighted average on the 10 newest stats (ignoring those 

// with zero or null cpu times) 

OLDIAGSTRT (OLX::GetInternalDiagStream(), kDCEDebug, 1) 
« "DCEQueryPredictionStats::EstimateElapsedTimeO" « endl 
« " Estimating cpu/elapsed time ratio..." « endl 

« OLDIAGEND; 

DCENumber runningTotal = 0; 

DCENumber ratio; 

DCENumber divider = (int)0; 

# ifdef _DEBUG 
{ DCEStatisticsPByDateCttr tastElement - statsByDate.end(); 
OL_ASSERT( slatsByDate.emptyO |j 

(" (statsByDate.beginO))->datestamp >= 
(* (--lastElement)) ->dateStamp ); } 
# endif 

DCEStatisticsPByDatecItr dateltr = statsByDate.beginO; 
for (int i - 0; i < 10 && dateltr !- statsByDate .endO; 
dateltr++) 

{ DCENumber actCpu - (*dateItr)->actualCpuTime; 

DCENumber actElapsed = (*dateItr)->actualElapsedTime; 
// ignore stats with null or zero cpu times and zero elapsed 
// times 

if ( actCpu. is_null() |j actCpu «= 0 jj actElapsed — 0 ) 
continue; 

ratio - (OLNumber) ( actCpu ) / (OLNumber) ( actElapsed ); 

OLDIAGSTRT (OLX::GetInternalDiagStream0, kDCEDebug, 2) 
« " Using stat, actualCPU:" « actCpu 
« **, actual Elapsed: " « actElapsed 

« ratio: " « ratio « endl 

« OLDIAGEND; 

runningTotal +- ratio * (10 - i); 

divider +- (10 - i); 

! ++; . . > 

if ( divider 0 [| runningTotal «» 0 ) 
weightedTime Ratio - 1; 

weight edTime Ratio » runningTotal / OLNumber(divLder); } 
OLDIAGSTRT (OLX::GetInternalDiagStreamQ, kDCEDebug, 1) 
« "DCEQueryPredictionStats::EstimateElapsedTimeO M « endl 
« " Weighted time ratio: " « weightedTime Ratio « endl 



■continued 



« OLDIAGEND; 
// Searching by cost find the 10 "nearest" stats 
5 // note that in the following code, the time (cpu or elapsed) 
// used depends on the value of mUseCpuTime. 
// First create a dummy object to use for searching 
DCEQPPStatistic stat; 
stat.cost = cost, 
// Find the stat within the existing stats with the closest cost 
10 DCEStatisticsPByCostCItr nearestCost - 
statsByCost.lower_bound(&stat); 
// Calculate the weighted avg cost and time in 3 stages: 
DCENumber costRunningTotal - 0; 
DCENumber time RunningTotal - 0; 
DCENumber divider - (int)0; 

35 DCEStatisticspByCostCItr costltr; 
int offset; 
// Check that we are not pointing at the end and that there is 
// something there at all.... 
if ( nearestCost statsByCost.endO ) 
nearestCost--; 

2Q OLDIAGSTRT (OLX-GetlnternalDiagStreamO, kDCEDebug, 1) 

« ''DCEQueryPredictionStats::EslimateElapsedTime() " << endl 
« " Searching for nearest existing stat by cost. Found 
cost: " 

« (*nearestCost)->cost « endl 
« OLDIAGEND; 
// first, find the maximum cost offset to be considered within 
25 // the existing stats 

DCENumber maxCostOffset (0); 
{ for (costltr = nearestCost, offset = 0; offset < 5; 

costltr-, offset++) 
{ DCENumber costDifference - cost - (*costItr)->cost; 
costDiff ere nee . Abs 0 ; 
30 if ( costDifference > maxCostOffset ) 

maxCostOffset - costDifference; 
if ( costltr — stats By CosLbeginO) 
break; } 
if (nearestCost !- statsByCost.endO) 
{ for (costltr = nearestCost, costItr++, offset = 1; 
35 offset < 5 && costltr !- statsByCost.endO; 

cost!tr++, offset++) 
{ DCENumber costDifference - cost - (*costItr)->cost; 
costDifference.AbsO; 
if ( costDifference > maxCostOffset ) 

maxCostOffset - costDifference; } } 
maxCostOffset++; 

OLDIAGSTRT (OLX::GetInternalDiagStream0, kDCEDebug, 1) 
« "DCEQueryPredictionStats::EstimateElapseaTimeO M « endl 
« " Found max cost offset: " « maxCostOffset « endl 

« OLDIAGEND; } 

// second, consider the five nearest stats before (and including) 

// the closest one 

45 OLDIAGSTRT (OLX^GetlntemalDiagStreamO, kDCEDebug, 1) 

« ''DCEO^eryPredictionStats-EstimatcElapsedTimeQ" « endl 
« " Calculating weighted averages of cost & time..." « 
endl 

« OLDIAGEND; 

for (costltr - nearestCost, offset - 0; offset < 5; costltr--, 
50 offset ++) 

{ DCENumber costDifference «* cost - (*costItr)->cost; 
costDifference.AbsO; 

DCENumber costOffset (OLNumber (maxCostOffset) - 
OLNumber (costDifference)) ; 

DCENumber time - ( mUseCpuTime ) ? 
55 ((*costItr)->actualCpuTime) : 
((*costItr)->actualElapsedTime); 

if ( Uime.is__nulI0 ) 
{ costRunningTotal +» (*costItr)->cost • OLNumber(costOffset); 

timeRunningTotal +- time 
OLNumber(costOffset); 

divider +- costOffset; 

OLDIAGSTRT (OLX::GetInternalDiagstream0, kDCEDebug, 2) 
« " Using cost: " « (*costltr)->cost 
« 14 , stat offset: " « offset 
« cost offset: " « costDifference 
« endl 

« OLDIAGEND; } 
65 if (costltr "» statsByCost.beginO) 
break; } 
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// then, consider the five nearest stats after (but not 
// including) the closest one 
if (nearestCost !- statsByCost.endO) 
{ for (costltr = nearestCost, costltr++, offset o 1; 

offset < 5 && costltr != statsByCost.endO; 
costItr++, offset++) 
{ DCENumber costDifference = cost - (*costItr)->cost; 
costDifference. Abs0; 

DCENumber costOffset (OLNumber(maxCostOffset) - 
OLNumber (costDifference)); 

DCENumber time = ( mUseCpuTime ) ? 
((•costltr)- >actualCpuTime) : 
((*costIlr)->aetualElapsedTime); 

if ( !time.is_null() ) 

{ costRunmngTotal +« (*costItr)->cost * 
OLNumber(costOffset) ; 
time Run ningTota! += time * OLNumber(costOffset); 
divider += costOffset; 

OLDIAGSTRT (OLX::GetInternalDiagStreamO, kDCEDebug, 2) 
« " Using cost: " « ( "costltr)- >cost 
« stat offset: " « offset 
« cost offset: " « costDifference 
« endl 

« OLDIAGEND; } } } 
// Now perform the calculations for weighted avgs 

DCENumber weighted Cost = costRunningTotal / OLNumber(divider); 

DCENumber weightedTime » timeRunningTotal / OLNumber (divider); 
OLDIAGSTRT (OLX::GetInternalDiagStream(), kDcEDebug, 1) 



"DCEQjieryPrediciionStats::EstimateElapsedTirne()" 



z endl 
« endl 
« endl 
« endl 
« endl 
« endl 



Cost Running Total: " « costRunningTotal 
« " Time Running Total: " « timeRunningTotal 
« " Divider: " « divider 

« " Weighted Cost: " « weightedCost 

« " Weighted Time: " « weightedTime 

« OLDIAGEND; 
// Estimate the time 

DCENumber estTime = (cost / weightedCost) * weightedTime; 
// Finally, estimate the elapsed time 

DCENumber estElapsedTime = ( estTime == 0 || weigh tedTimeRatio 
==0) 

? (OLNumber(O)) 

: ((OLNumber)estTime / 

(OLNumber) weightedTime Ratio) ; 

OLDLAGSTRT (OLX::GetInternaIDiagStreamQ 3 kDCEDebug, 1) 
« "DCEQuery Predictions tats ::EstimateElapsedTimeO" << endl 
« " Estimated Time: " « estTime « endl 

« ** Estimated Elapsed Time: " « estElapsedTime « endl 
« OLDIAGEND; } 
if(estElapsedTime<DCENumber(DCE_MIN_SENSIBLE_ESTIMATE)) 
{ estElapsedTime = DCE_MIN_S ENS IB LE_ESTTMATE; 

OLDIAGSTRT (OLX::GetInternalDiagStreamO, kDCEDebug, 1) 
« "Adjusting estimate to minimum sensible value: " 
« estElapsedTime « endl 
« OLDIAGEND; } 

# ifdef _DEBUG 

long debugTime = estElapsedTime; 

# endif 

return estElapsedTime; } 
// DCEOueryPredictionStats::EstimateElapsedTimeO 
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Thus, the invention is not to be limited except in accor- 
dance with the appended claims. 
What is claimed is: 

1. A method for estimating the time for a computer system 
to respond to a database query, comprising: 

determining an estimated cost for executing the database 
query; 

identifying, using the estimated cost, previously executed 
queries comprising characteristics the same as, or simi- 
lar to, the database query; and 

deriving an estimate of the time required for responding 
to the database query based on information associated 
with the previously executed queries. 

2. The method of claim 1, in which identifying previously 
executed queries comprises comparing specific data sought 
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in the database query with specific data sought in queries 
previously executed. 

3. The method of claim 1, in which deriving an estimate 
of the time required for responding to the database query 
comprises 

generating an estimated processing time for the database 
query using an actual CPU processing time for a 
previously executed query, and 

applying a current ratio to the estimated processing time, 
the current ratio comprising a ratio of elapsed time for 
the computer system to respond to one or more prior 
queries to actual CPU time to process the one or more 
prior queries. 

4. The method of claim 3, in which the actual CPU 
processing time for a previously executed query comprises 
the actual CPU processing time to process a previously 
executed query identified as comprising characteristics the 
same as, or similar to, the database query, and the one or 
more prior queries comprises one or more of the most 
current executed database queries. 

5. The method of claim 4, in which the current ratio 
further comprises a weighted average of elapsed time to 
actual CPU time. 

6. A method for estimating the time required for a 
computer system to respond to a present query, comprising: 

identifying a plurality of previously executed queries 
comprising characteristics the same as, or similar to, 
the present query; and 

deriving an estimate of the time required for responding 
to the present query based on information associated 
with the plurality of previously executed queries. 

7. The method of claim 6, in which identifying a plurality 
of previously executed queries comprises 

comparing an estimated cost for executing the present 
query with estimated costs of executing a plurality of 
prior queries, and 

comparing a result set of the present query with result sets 
of the plurality of prior queries, a result set comprising 
an identification of the tables accessed in a database to 
respond to the respective present query or respective 
prior query. 

8. The method of claim 6, further comprising determining 
an estimated cost for executing the present query, and in 
which deriving an estimate of the time required for respond- 
ing to the present query comprises determining whether a 
prior executed query has an identical estimated cost as the 
estimated cost for executing the present query. 

9. The method of claim 6, in which deriving an estimate 
of the time required for responding to the present query 
comprises 

generating an estimated processing time for the present 
query using an actual CPU processing time of a pre- 
viously executed query, and 

applying a current ratio to the estimated processing time, 
the current ratio comprising a ratio of elapsed times to 
actual times, the elapsed times comprising the times for 
the computer system to respond to a set of one or more 
prior queries and the actual times comprising the actual 
CPU processing times of the set of one or more prior 
queries. 

10. The method of claim 8, in which deriving an estimate 
of the time required for responding to the present query 
further comprises 

extrapolating an estimated processing time for executing 
the present query based on the actual processing times 
of the plurality of previously executed queries, if no 
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prior executed query has an estimated cost equal to the 
estimated cost for executing the present query. 

11. The method of claim 10, in which extrapolating an 
estimated processing time comprises determining an esti- 
mated processing time based on a weighted average of the 
actual processing times of the plurality of previously 
executed queries. 

12. A machine readable medium having stored thereon a 
program for causing a computer to: 

determine an estimated cost for executing a present query; 

identify, using the estimated cost, one or more previously 

executed queries comprising characteristics the same 

as, or similar to, the present query; and 

derive an estimate of the time required for responding to 
the present query based on information associated with 
the one or more previously executed queries! 

13. A machine readable medium having stored thereon a 
program for causing a computer to: 

identify one or more previously executed queries com- 
prising characteristics the same as, or similar to, a 
present query; and 

derive an estimate of the time to respond to the present 
query based on information associated with the one or 
more previously executed queries. 

14. The machine readable medium of claim 13, in which 
to identify one or more previously executed queries, the 
program further causes the computer to 

compare an estimated cost for executing the present query 
with estimated costs of executing a plurality of prior 
queries, and 

compare a result set of the present query with result sets 
of the plurality of prior queries, a result set comprising 
an identification of the database tables accessed to 
respond to the respective present query or respective 4 o 
prior query. 

15. The machine readable medium of claim 13, in which 
to derive an estimate of the time to respond to the present 
query, the program further causes the computer to 

generate an estimated processing time for the present 
query using an actual CPU processing time of a pre- 
viously executed query, and 

apply a current ratio to the estimated processing time, the 
current ratio comprising a ratio of elapsed time to 
actual CPU time, the elapsed time comprising one or 
more elapsed times for the computer system to respond 
to one or more prior executed queries, and the actual 
CPU time comprising one or more actual CPU pro- 
cessing times of the one or more prior executed queries. 
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16. A computer system, comprising: 
a computer; and 

a data storage device, said data storage device comprising 
a program residing thereon for causing said computer 
to estimate a time for the computer system to respond 
to a present database query by 
determining an estimated cost for executing the present 

database query, 
identifying, using the estimated cost, previously 
executed queries comprising characteristics the same 
as, or similar to, the present database query, and 
deriving an estimate of the time required for responding 
to the present database query based on information 
associated with the previously executed queries. 

17. A computer system, comprising: 
a computer; and 

a data storage device, said data storage device comprising 
a program residing thereon for causing said computer 
to estimate a time for the computer system to respond 
to a present query by 

identifying a previously executed query comprising 
characteristics the same as, or similar to, the present 
query, and 

deriving an estimate of a time for responding to the 
present query based on information associated with 
the previously executed query. 

18. The computer system of claim 17, in which said 
program further causes said computer to 

compare an estimated cost for executing the present query 
with estimated costs of executing a plurality of prior 
queries, and 

compare a result set of the present query with result sets 
of the plurality of prior queries, a result set comprising 
an identification of the database tables accessed to 
respond to the respective present query or respective 
prior query, 

19. The computer system of claim 18, in which said 
program further causes said computer to 

generate an estimated processing time for the present 
query using an actual CPU processing time of a pre- 
viously executed query, and 

apply a current ratio to the estimated processing time, the 
current ratio comprising a ratio of one or more elapsed 
times for the computer system to respond to a set of 
prior queries to one or more actual CPU processing 
times for the set of prior queries. 

20. The computer system of claim 19, in which the actual 
CPU processing time of a previously executed query com- 
prises the actual CPU processing time to process the previ- 
ously executed query comprising characteristics the same as, 
or similar to, the present query, and the set of prior queries 
comprises one or more of the most current executed queries. 
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